Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation
نویسندگان
چکیده
Clustering accuracy of partitional clustering algorithm for categorical data depends primarily on the choice of initial data points to instigate the clustering process and hence the clustering results cannot be generated and repeated consistently. In this paper we present an approach to compute initial modes for K-mode partitional clustering algorithm to cluster categorical data sets. Here we utilized the idea of evidence accumulation for combining the results of multiple clusterings. Initially, n F − dimensional data is decomposed into a large number of compact clusters; the K-modes algorithm performs this decomposition, with several clusterings obtained by N random initializations of the K-modes algorithm and the modes thus obtained for every random initialization are stored in a Mode-Pool, PN. The objective is to investigate the contribution of those data objects / patterns that are less vulnerable to the choice of random selection of modes and to choose the most diverse set of modes from the available Mode-Pool that can be utilized as initial modes for the K-mode clustering algorithm. Experimentally we found that by this method we get initial modes that are very similar to the actual / desired modes and gives consistent and better clustering results with less variance of error than the traditional method of choosing random modes.
منابع مشابه
An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering
The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...
متن کاملCluster Center Initialization for Categorical Data Using Multiple Attribute Clustering
The K-modes clustering algorithm is well known for its efficiency in clustering large categorical datasets. The K-modes algorithm requires random selection of initial cluster centers (modes) as seed, which leads to the problem that the clustering results are often dependent on the choice of initial cluster centers and non-repeatable cluster structures may be obtained. In this paper, we propose ...
متن کاملComputing Initial points using Density Based Multiscale Data Condensation for Clustering Categorical data
The K-Modes clustering algorithm [1] has shown great promise for clustering large data sets with categorical attributes. K-Mode clustering algorithm suffers from the drawback of choosing random selection of initial points (modes) of the cluster. Different initial points leads to different cluster formations. In this paper Density-based Multiscale Data Condensation [2] approach with hamming dist...
متن کاملAttribute Value Weighting in K-Modes Clustering
In this paper, the traditional k-modes clustering algorithm is extended by weighting attribute value matches in dissimilarity computation. The use of attribute value weighting technique makes it possible to generate clusters with stronger intra-similarities, and therefore achieve better clustering performance. Experimental results on real life datasets show that these value weighting based k-mo...
متن کاملSelection Initial modes for Belief K-modes Method
The belief K-modes method (BKM) approach is a new clustering technique handling uncertainty in the attribute values of objects in both the cluster construction task and the classification one. Like the standard version of this method, the BKM results depend on the chosen initial modes. So, one selection method of initial modes is developed, in this paper, aiming at improving the performances of...
متن کامل